Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset, code, and models can be found at https://persuasion-deductiongame.socialai-data.org.
translated by 谷歌翻译
The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.
translated by 谷歌翻译
A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiring any category labels. To this end, we propose Deep Object Patch Encodings (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels. To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object, and use this to formulate a self-supervised learning task to learn discriminative object patches. We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines. Code and data available at https://github.com/rehg-lab/dope_selfsup.
translated by 谷歌翻译
我们介绍了一个新的问题,即从以自我为中心的视频中预期一个未来的手罩时间序列。一个关键的挑战是对未来头部运动的随机性进行建模,该动作在全球范围内影响了头饰的摄像头视频分析。为此,我们提出了一个新颖的深层生成模型-Egogan,它使用3D完全卷积网络来学习一个时空视频表示,以视觉预期,可以使用生成的对抗网络(GAN)和然后,根据视频表示和生成的未来头部运动来预测未来的手蒙版。我们在Epic-Kitchens和Egtea凝视+数据集上评估了我们的方法。我们进行详细的消融研究,以验证我们方法的设计选择。此外,我们将我们的方法与以前的未来图像分割方法进行比较,并表明我们的方法可以更准确地预测未来的手掩模。
translated by 谷歌翻译
注意机制期望关于概率权重的数据表示。这会创建摘要统计,重点关注重要功能。最近,(Martins等,2020,2021)提出了不断的注意机制,重点关注指数和变形指数家庭的单峰关注密度:后者稀疏支持。(Farinhas等人2021)扩展了这一点,以利用高斯混合的注意力密度,这是一种具有密集支持的灵活级别。在本文中,我们将此扩展到两个一般灵活类:内核指数系列和我们的新稀疏对方内核变形指数家庭。从理论上讲,我们对内核指数和变形的指数系列表示新的存在结果,并且变形的情况对内核指数系列具有类似的近似能力。实验表明,内核变形指数系列可以参加数据域的多个紧凑区域。
translated by 谷歌翻译
生态瞬间评估(EMAS)是用于测量移动卫生(MHECHEATH)研究和治疗方案的当前认知状态,影响,行为和环境因素的重要心理数据源。非反应,其中参与者未能响应EMA提示,是一个地方问题。准确预测非响应的能力可用于改善EMA交付和发展顺应性干预。事先工作已经探索了古典机器学习模型,以预测非反应。然而,正如越来越大的EMA数据集可用,有可能利用在其他领域有效的深度学习模型。最近,变压器模型在NLP和其他域中显示了最先进的性能。这项工作是第一个探索用于EMA数据分析的变压器的使用。我们在将变压器应用于EMA数据时解决了三个关键问题:1。输入表示,2.编码时间信息,3.预先培训提高下游预测任务性能的效用。变压器模型实现了0.77的非响应预测AUC,并且明显优于古典ML和基于LSTM的深度学习模型。我们将使我们的一个预测模型在研究界可自由地提供40k EMA样品的核查,以便于开发未来的基于变压器的EMA分析工作。
translated by 谷歌翻译
鉴于从第一人称角度捕获的视频以及录制视频的环境环境,我们可以认识到该人在做什么并确定3D空间中的动作发生在哪里吗?我们解决了这个具有挑战性的问题,即在以自我为中心视频的已知3D地图上共同识别和本地化操作。为此,我们提出了一种新颖的深层概率模型。我们的模型采用了3D环境的层次体积表示(HVR)的输入和以自我为中心的视频,将3D Action位置视为潜在变量,并根据其潜在位置的视频和上下文提示识别动作。为了评估我们的模型,我们对EGO4D数据集的子集进行了广泛的实验,其中捕获了人类自然主义的作用和照片现实的3D环境重建。我们的方法证明了在可见和看不见的环境之间进行动作识别和3D动作定位的强劲结果。我们认为,我们的工作指向以自我为中心的视觉和3D场景理解的相交的令人兴奋的研究方向。
translated by 谷歌翻译
持续的学习是遭受灾难性的遗忘,这是一个早期学识渊博的概念被遗忘的现象,以牺牲更新的样本。在这项工作中,我们挑战持续学习不可避免地与灾难性忘记相关的假设,通过展示一系列令人惊讶的是在不断学习时令人惊讶地没有灾难性的遗忘遗忘。我们提供了证据表明,这些重建类型任务表现出正向转移,并且单视网型重建随着时间的推移提高了学习和新型类别的性能。通过查看顺序学习任务的产出分配转移,我们提供了对知识转移能力的新颖分析。最后,我们表明这些任务的稳健性导致具有用于连续分类的代理代表学习任务的可能性。可以在https://github.com/rehg-lab/lrorec中找到与本文发布的CodeBase,DataSet和预训练模型。
translated by 谷歌翻译
One of the major errors affecting GNSS signals in urban canyons is GNSS multipath error. In this work, we develop a Gazebo plugin which utilizes a ray tracing technique to account for multipath effects in a virtual urban canyon environment using virtual satellites. This software plugin balances accuracy and computational complexity to run the simulation in real-time for both software-in-the-loop (SITL) and hardware-in-the-loop (HITL) testing. We also construct a 3D virtual environment of Hong Kong and compare the results from our plugin with the GNSS data in the publicly available Urban-Nav dataset, to validate the efficacy of the proposed Gazebo Plugin. The plugin is openly available to all the researchers in the robotics community. https://github.com/kpant14/multipath_sim
translated by 谷歌翻译
In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute tagging. Specifically, we enrich labeling on malicious and benign PE files using computationally expensive, disassembly-based malicious capabilities. Using these capabilities, we derive several different types of metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation, and combinations thereof. We then examine performance on a variety of transfer tasks performed on the EMBER and SOREL datasets, demonstrating that for several tasks, low-dimensional, computationally efficient metric embeddings maintain performance with little decay, which offers the potential to quickly retrain for a variety of transfer tasks at significantly reduced storage overhead. We conclude with an examination of practical considerations for the use of our proposed embedding approach, such as robustness to adversarial evasion and introduction of task-specific auxiliary objectives to improve performance on mission critical tasks.
translated by 谷歌翻译